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£NJ . Abstract. A two- groups mixed-effects modei for the comparison of 

£H | (normalized) microarray data from two treatment groups is considered. 

0$ ■ Most competing parametric methods that have appeared in the litera- 

ture are obtained as special cases or by minor modification of the pro- 
posed model. Approximate maximum likelihood fitting is accomplished 
via a fast and scalable algorithm, which we call LEMMA (Laplace ap- 
proximated EM Microarray Analysis) . The posterior odds of treatment 
x gene interactions, derived from the model, involve shrinkage esti- 
mates of both the interactions and of the gene specific error variances. 
Genes are classified as being associated with treatment based on the 
posterior odds and the local false discovery rate (f.d.r.) with a fixed 
cutoff. Our model-based approach also allows one to declare the non- 
null status of a gene by controlling the false discovery rate (FDR). It 
is shown in a detailed simulation study that the approach outperforms 
^5 \ well-known competitors. We also apply the proposed methodology to 

■ two previously analyzed microarray examples. Extensions of the pro- 
posed method to paired treatments and multiple treatments are also 
discussed. 

Key words and phrases: EM algorithm, empirical Bayes, Laplace ap- 
proximation, LEMMA, LIMMA, linear mixed models, local false dis- 
covery rate, microarray analysis, mixture model, two-groups model. 
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Microarray technologies have become a major data 
generator in the post-genomics era. Instead of work- 



allow scientists to view the expression of thousands 
of genes from an experimental sample simultane- 
ously. Due to the cost, it is common that thousands 
of genes are measured with a small number of repli- 
cations, as a consequence, one faces a large G, small 
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with prior probability po or p\ = 1 — po , respectively. 
The two-groups model plays an important role in 
the Bayesian microarray literature and is broadly 
applicable (Efron, 2008). 

A general review of issues pertaining to microar- 
ray data analysis is provided in Allison et al. (2006). 
Here, we focus on statistical inference and, in partic- 
ular, on what Allison et al. (2006) refer to as "con- 
sensus points 2 and 3": the advantages of shrinkage 
methods, and controlling the false discovery rate. 
We review several inferential methods, and develop 
a unifying linear model approach. 

Classical parametric statistics do not provide a 
reliable methodology for determining differentially 
expressed genes. The large number of genes with 
relatively few replications in typical microarray ex- 
periments yield variance estimates of the expres- 
sion levels that are often unreliable. The classical 
t-test and F-test are generated under a heteroge- 
neous error variance model assumption and do not 
enjoy the advantage gained by shrinkage estimation. 
The assumption that the variances are equal across 
all genes is typically not realistic. Hypothesis tests 
based on a pooled common variance estimator for all 
genes have low power and can result in misleading 
differential expression results (Wright and Simon, 
2003; Smyth, 2004; Cui et al., 2005). 

An important observation is that, although there 
are only a few replications for each gene, the total 
number of measurements is very large. If informa- 
tion is combined across the genes (i.e., genome- wide 
shrinkage), it is possible to construct test proce- 
dures that have improved performance. The SAM 
test (Tusher, Tibshirani and Chu, 2001) and a reg- 
ularized t-test in Efron et al. (2001) first used infor- 
mation across the genome-wide expression values by 
the addition of a data-based constant to the gene- 
specific standard errors. 

The Bayesian approach seems to be particularly 
well suited for combining information in expression 
data. Hierarchical Bayesian models have also been 
used for variance regularization by estimating mod- 
erated variances of individual genes. The estimated 
variances are calculated as weighted averages of the 
gene-specific sample variances and pooled variances 
across all genes. In particular, the regularized i-test 
proposed by Baldi and Long (2001) uses a hier- 
archical model and substitutes an empirical Bayes 
variance estimator based on a prior distribution in 
place of the usual variance estimate. Another hier- 
archical approach was developed in Newton et al. 



(2001) for detecting changes of gene expression in 
a two-channel cDNA microarray experiment. This 
was extended to replicate chips with multiple condi- 
tions using a hierarchical lognormal-normal model 
in Kendziorski et al. (2003). A key difference be- 
tween these models and those discussed above is that 
they effectively induce shrinkage in the mean effects 
(i.e., the numerator of the t-statistic), while assum- 
ing homogeneous variability across genes. 

Instead of directly modeling the variation of the 
expression data, two-groups models are character- 
ized by mixing measurements over latent gene-specific 
indicators. Lonnstedt and Speed (2002) used this 
approach to derive the so-called 5-statistic as the 
logarithm of the posterior odds of differential ex- 
pression. Smyth (2004) extended the .B-statistic to 
the linear models setting and has written the widely 
used limma R package (R Development Core Team, 
2007). Smyth (2004) also shows that the 5-statistic 
is a monotone function of a i-statistic with a regu- 
larized variance which he refers to as a moderated 
t-statistic. Wright and Simon (2003) and Cui et al. 
(2005) derive similarly moderated statistics, and Cui 
et al. (2005) showed that their proposed test, us- 
ing a James-Stein type variance estimator, had the 
best or nearly the best power for detecting differen- 
tially expressed genes over a wide range of situations 
compared to a number of existing alternative proce- 
dures. 

Since the performance of the -F-type test statistics 
arising from models with a random gene-specific er- 
ror variance (leading to shrinkage estimates of the 
error variances) is better than in the case where the 
variances are fixed, why only model the variances as 
random but not the means? In effect, the approach 
of Lonnstedt and Speed (2002), and its extension 
in Smyth (2004), already do this by treating both 
the gene-specific mean effects and error variances as 
random. These models have been further general- 
ized by Tai and Speed (2006, 2009) to the multi- 
variate setting to handle, for example, short time- 
series of microarrays. These authors coined the term 
"fully moderated" for such models. However, as we 
point out later is Section 4, the specific distribu- 
tional assumptions made in these models imply that 
the shrinkage factor for the mean effects is the same 
for all genes, resulting in performance equivalent to 
the ordinary moderated-i. 

Hwang and Liu (2010) proposed an alternative 
empirical Bayes approach which shrinks both the 
means and variances differentially (see also Liu, 2006). 
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Their simulation studies indicate that their fully 
moderated procedure is more powerful than all the 
other tests existing in the literature. The Hwang and 
Liu (2010) procedure uses method of moments esti- 
mators of some model parameters rather than max- 
imum likelihood. The advantage of our EM fitting 
algorithm is that it is easily extended to more gen- 
eral models, for example, including covariates, or the 
three groups mixture model discussed in Section 3.4. 
Still, their approach provided the key insight that 
motivated the model formulation and subsequent 
computational algorithm described in this article. 

The development of the empirical Bayes method- 
ologies that improve the power to detect differen- 
tially expressed genes essentially reduces to the choice 
of whether gene-specific effects should be modeled as 
fixed or random. This question applies to effects on 
both the mean and the error variance. Thus, there 
are four combinations of fixed and random factors 
leading to four models which we denote by FF, RF, 
FR and RR, where the first letter identifies whether 
the mean effects are fixed or random and the second 
letter does the same for the error variances. Two ad- 
ditional models, denoted FH and RH, are obtained 
if the error variances are assumed to be homoge- 
neous across genes. The FF category corresponds to 
the naive approach of applying t- or F-tests to each 
gene separately. The FR category includes the mod- 
els in Wright and Simon (2003) and Cui et al. (2005). 
The gamma-gamma and log-normal-normal models 
of Newton et al. (2001) and Kendziorski et al. (2003) 
are of the RH type. The approach of Hwang and Liu 
(2010) falls in the RR category. Table 1 summarizes 



how previously proposed statistics fall into the six 
model categories. Note that the RR category also in- 
cludes the LIMMA model. However, inference with 
the i?-statistic of Lonnstedt and Speed (2002) and 
Smyth (2004) results in a shrinkage factor for the 
mean effects which is the same for all genes. Conse- 
quently, LIMMA is therefore similar to an FR-type 
model in terms of frequentist performance since the 
posterior odds are monotone in the moderated t- 
statistic. 

In this paper we present a unified modeling frame- 
work for empirical Bayes inference in microarray ex- 
periments together with a simple and fast EM algo- 
rithm for estimation of the model parameters. We 
focus on a simple two-condition experimental setup, 
but the ANOVA formulation we posit in the next 
section allows for easy generalization to more than 
two conditions and comparisons based on a single 
sample of two channel arrays such as the more gen- 
eral designs in Kerr, Martin and Churchill (2000) 
and Smyth (2004). The methods of this article can, 
in principle, also be extended to a multivariate em- 
pirical Bayes model, for example, to analyze short 
time-course data as in the extension of the £?-statistic 
by Tai and Speed (2006, 2009), or to multiple ar- 
ray platforms as is used in epigenomic data analysis 
(Figueroa et al., 2008). 

We apply an approximate EM algorithm for fitting 
the proposed model, with the latent null/non-null 
status of each gene playing the role of missing data. 
The integral needed to evaluate the complete data 
likelihood makes direct application of the EM algo- 
rithm intractable. However, a simple and accurate 



Table 1 

Models corresponding to combinations of fixed and random factors 



Mean effect Error variance Methods 



Fixed 


Fixed (Heterogenous) 


t-test/F-test 


Fixed 


Fixed (Homogenous) 


F s in Cui and Churchill (2003) 


Fixed 


Random 


Wright and Simon (2003), 
Cui et al. (2005), 
Lonnstedt and Speed (2002), 
Smyth (2004) 


Random 


Fixed (Heterogenous) 




Random 


Fixed (Homogenous) 


Newton et al. (2001), 
Kendziorski et al. (2003) 


Random 


Random 


Fss m Hwang and Liu (2010), 
Lonnstedt, Rimini and Nilsson (2005), 
Tai and Speed (2009), 
Lonnstedt and Speed (2002), 
Smyth (2004) 
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approximation is obtained via the Laplace approx- 
imation (de Bruijn, 1981, Chapter 4; Butler, 2007, 
page 42). This approximation makes the EM algo- 
rithm scalable, tractable, and extremely fast. Imple- 
mentation of Bayesian microarray models typically 
involves drawing MCMC samples from the poste- 
rior distribution of effects from all genes. MCMC 
sampling provides a mechanism to study the full 
Bayesian posterior distribution. However, there is a 
heavy computational burden that makes the MCMC 
implementation less attractive. The Laplace approx- 
imation circumvents the generation of the thousands 
of gene effect parameters and gives a highly accurate 
approximation to the integral in the expression of 
the complete data likelihood. The Laplace approx- 
imated EM algorithm based analysis is the inspi- 
ration of the acronym LEMMA (Laplace approxi- 
mated EM Microarray Analysis) for the contributed 
R package, lemma (Bar and Schifano, 2009), which 
implements the methodology described in this pa- 
per. 

The paper is organized as follows. In Section 2 
we introduce the necessary notation for our two- 
groups model along with the prior distribution spec- 
ifications. Section 3 describes the approximate EM 
algorithm for fitting the RR model. We also pro- 
pose a generalization of the LIMMA model and show 
how the EM algorithm is easily modified to estimate 
its parameters, and we briefly discuss extensions to 
multiple treatments and to a three-groups model. In 
Section 4 we show that the posterior probability that 
a gene is non-null is a function of a fully-moderated 
(in the sense of Hwang and Liu, 2010) posterior t- 
statistic with shrinkage in both the numerator and 
the denominator. We show that our RR framework 
generalizes several other statistics, and describe two 
inferential procedures, one based on the posterior 
probability that a gene is non-null, and one which is 
based on the null distribution and the FDR proce- 
dure. Section 5 gives results of a simulation study in 
which we compare the performance of various meth- 
ods to the "Optimal Rule" procedure based on full 
knowledge of the true model and its parameters. Our 
proposed methodology is applied to two well-known 
microarray examples: the ApoAl data (Callow et 
al., 2000) and the Colon Cancer data (Alon et al., 
1999) in Section 6. We conclude the article in Sec- 
tion 7 with a discussion. 



2. MODEL AND NOTATION 

Let yijg denote the response (e.g., log expression 
ratio) of gene g, for subject (replicate) j, in treat- 
ment group i = l,2. We begin with the linear model, 

(1) y ijg =fi + Ti +7g +1p ig + £y g , 

with a typical assumption concerning the errors be- 
ing 

(2) £ ijg ~Li.d. N(0,al g ) 

for j = l,...,riig, independently across genes and 
treatment groups. We impose the identifiability con- 
straints, n + T2 = and ipi g + -023 = for all g = 
1, . . . , G. Then r = n — t<i is the main effect of treat- 
ment, averaged across genes, and ip g = ipi g — V>2g , 
g = 1, . . . , G, are the gene specific treatment effects. 
Note that we do not assume that the mean treat- 
ment effect is zero. While assuming r = is often 
reasonable when performing differential gene expres- 
sion analysis on large microarray data sets, we find 
this to be not only an unnecessary constraint, but 
also unrealistic in certain situations. For example, 
when a data set consists mostly of genes that are 
known to be differentially expressed, or when com- 
paring expression levels across species (where "treat- 
ment" is interpreted as "species" ) , there is no reason 
to assume that the overall mean difference between 
the two treatment groups is zero. 

We further suppose that the genes fall into two 
groups, a null group in which ip g = and a non- 
null group in which ifj g 7^ 0. The primary goal is 
to classify genes as null or non-null based on the 
observed responses. A probabilistic approach is to 
suppose that each gene has prior probability p\ of 
being non-null (and po = 1 — p\ of being null) and to 
use Bayes rule to determine the posterior probability 
given the data; specifically, 

/o\ / x Pih,g{y g ) 

3 PiAVg) = f t \ t t v 

PofoAygJ+PihAVg) 

where fi !g (y g ) is the probability density of the re- 
sponses for gene g implied by the non-null model, 
and fo t g{y g ) is the corresponding quantity if the gene 
is in the null group. 

In practice, of course, the mixture probability and 
the parameters that determine the null and non-null 
densities have to be estimated. This estimation step 
depends upon additional assumptions, if any, that 
are made about the distribution of the responses. 
As noted in the Introduction, a basic question is 
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whether gene-specific effects should be modeled as 
fixed or random, leading to the model categories 
we denote by FF, RF, FR and RR, and two addi- 
tional models, FH and RH, obtained when the error 
variances are assumed to be homogeneous, that is, 

a lg = a s- 

The ANOVA model (1) together with the distri- 
butional assumption (2) allows us to restrict at- 
tention to the sum and difference of gene-specific 
treatment means, respectively, s g = y\. g + yi-g and 
d g = yi-g — Vi-gi and the gene-specific mean squared 
errors, 

2 nig 
i=l j=l 

where f g = n\ g + ni g — 2. Notice that s g \g ~ N(2/i + 
2j g ,a 2 ,), where a 2 g = a 2 g (l/n lg + l/n 2g ), and \g de- 
notes conditioning on any gene-specific random ef- 
fects. It follows that s g carries no information about 
the gene-specific treatment effect ip g . For this rea- 
son, our estimation procedures use only the marginal 
likelihood based on the data ({d g } , {m g }) . The model 
(1) together with assumption (2) also implies that d g 
and m g are conditionally independent, with d g \g ^ 
(1 — b g )No + b g Ni independently of m g \g ~ &c,gXf g / 
f g , where b g , g = l,...,G, denotes independent 
Bernoulli (pi) latent indicators of non-null status for 
the G genes, Nq and N± denote normal variates with 
unequal means r and r + ip g ^ r respectively, but 
equal variances a g , and Xf g denotes a chi-squared 
variate with f g degrees of freedom. 

The family of parametric models considered in 
this paper is completed by specifying distributions 
for the gene-specific effects, {ip g } and {<7^ 9 }. In what 
follows we suppose that, if the (non-null) gene-specific 
effects are modeled as random variates, they follow 
a normal distribution, 

(4) r/. ff ~id.d.iV(^4). 

On the other hand, if the gene-specific variances are 
modeled as random variates, they are drawn from 
an inverse gamma distribution, 

(5) a~ g ~ i.i.d. Gamma(a, /3), 

where a and /3 are shape and scale parameters. We 
refer to the RR model specified by (1), (2) and (5) 
with the non-null gene-specific effects (4) as the 
LEMMA model. 

It is worth contrasting (4) with the corresponding 
assumption in the models leading to the £>-statistic 



given in Lonnstedt and Speed (2002) and Smyth 
(2004), where the mean of the random effects dis- 
tribution is assumed to be zero. In a classical (one 
group) normal mixed-model, the mean of the ran- 
dom effect is assumed to be zero because it is not 
separately identifiable from the overall mean. How- 
ever, in the two-groups setting in which ip g in (1) 
is modeled as a mixture, assuming ip ^ in (4) 
poses no such identifiability problems. Furthermore, 
this additional parameter allows for two useful and 
important extensions of the model: (a) to paired 
(within-group) analyses, and (b) to three-groups al- 
lowing for over- and under-expressed non-null sta- 
tus. These extensions are described in more detail 
in Section 3.4. 

3. ESTIMATION 

In this section we describe in detail an approxi- 
mate EM algorithm for fitting the LEMMA model. 
Estimation for the other five models can be carried 
out by making appropriate modifications to this al- 
gorithm. The LEMMA model has six parameters, 
two being the shape and scale of the distribution for 
the error variances given in (5). The remaining vec- 
tor of parameters is (p\, r, ip, a^) which we denote 
by (f>. 

Estimates of the hyperparameters, a and /3, are 
obtained by maximizing the marginal likelihood based 
on {m g }, given by 

L({m g }) 

& poo 

= 11/ f^gWeJf^e^dcr'g 

0=1 Jo 

(6) 

r(/ g /2 + q) 
' (m g f g /2 + l/(3)f 3 /^' 

In practice, we find the maximum likelihood esti- 
mates for a and (3 using the nlminb function in 
R. In all the simulations and case studies the func- 
tion converged quickly. Since the marginal likelihood 
is based on the statistics {m g }, the computation 
time depends only on the number of genes, G, but 
not on the sample sizes. We have also derived and 
implemented moment estimators [similar to Smyth 
(2004), who comments that {m g } follow a scaled 
F-distribution] , and we found that both methods 
provide accurate estimation of a and /3. 
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3.1 EM Algorithm 

We apply the EM algorithm to estimate (ft, with 
the latent indicators, {b g }, playing the role of the 
missing data. Since d g and m g are conditionally in- 
dependent given (b g ,(jg ), the complete data likeli- 
hood for 4> based on ({bg},{d g },{m g }) is 



(7) 



G n 

Lc(0) = H / L(b g ,d g ;a 

9=1 J 



2 ) 



■ L(m g ;al g )f{a £ ^)da i 



e,9' 



where f(cr~ g ) represents the gamma density with 
shape a and scale f3. 

The integral in (7) makes direct application of the 
EM algorithm intractable. However, a simple and 
accurate approximation is obtained via the Laplace 
approximation (de Bruijn, 1981, Chapter 4; Butler, 
2007, page 42) 



G 



Lc{4>) ~L C (4>) = Y[L(b g ,d g ;al g )L(m g ;a 2 £ 

9=1 

W , 

.f(a~ 2 g )^-27T/i"(m g ;al g ), 

where £"(m g ;a 2 g ) is the second derivative of 
log L(m g ; (Teg) with respect to a 2 g , and a 2 g is the 
posterior mode of a 2 g given m g , given by 



(9) 



^ f g /2 + a + l 



+ 



a + l 



1 



/ 3 /2 + a + l (a + l)/3' 

Notice that the last three factors on the right-side of 
(8) do not involve the parameter <p and can therefore 
be ignored in the implementation of EM. In practice, 
we replace a and f3 by their maximum likelihood 
estimates obtained from the marginal likelihood in 
(6). 

Denote the estimate after m iterations of EM by 
(fr( m \ The (m + l)st E-step consists of taking the 
conditional expectation of the logarithm of (7) given 
the observed data, using the current estimate, (jy> m > . 
Using the Laplace approximation (8), this is given 
by 

Q{M {m) ) 

= E^m) [log L c ((t>)\{d g }, {m g }} 
ra-E^cm) [\ogL c {(t))\{d g },{m g }] 



G 

(10) =Y J E 4>{ m ) {\ogL{b g ,d g -dl g )\d g } + C 

9=1 
G 

= IZ^S )lo gbo/o, 9 (dc,)] 

9=1 

+ p^ g ) log\p 1 f 1 , g (d g )]} + C 

= Q(«^M) + C, 

where C does not depend on 0, /o iS and f\, g denote 
N(t, (Tg) and N(t + ip, a 2 ^ + a 2 ) densities with a 2 = 

a 2 g (l/n lg + l/n 2g ),pi, g = E(b g \d g ) and p , s +Pi,g = 
1. 

The M-step at the (m + 1) iteration requires maxi- 
mization of Q(4>, (p^) with respect to (f> to yield the 
updated estimate 4>( m+1 \ That is, 



arg max 



This leads to the following maximum likelihood es- 
timate update equations for p±, r and if): 

G 



(m+l) 
p\ 



— v» (m) 

9=1 



(12) 
and 
(13) 



V G n {m) d la 2 
2^g=lPo,g / g 



^(m+l) 



E? =1 £ ) /(-; (m) + 



or, 



while the update for <r^ is the solution of the equa- 



tion 



(14) 



G 

r>) !_ 

9=1 



°* + CT 9 

« (m) (dg _ r (m+l)_^(m+l) ) 2 

2^1,9 
9=1 



and cr^ = if = for all the genes. 

Strictly speaking, the update for ip in (13) is con- 
ditional on the current value of a 2 ^. However, we 
have found this variant of EM to have almost iden- 
tical convergence properties to the full EM in which 
Q is maximized jointly with respect to all four com- 
ponents of 4>. 
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3.2 Modifications for RF, RH, FF, FH, FR 

LEMMA is considered an RR model because the 
gene-specific effects (tpg,^ >g ) are modeled as ran- 
dom variates. By considering one or both of these as 
fixed effects, we obtain models that fall into one of 
the RF, RH, FR, FF or FH categories. Henceforth, 
the category labels RF, RH, FF, FH, FR refer to 
the models derived from the LEMMA (RR) model 
with the corresponding fixed/random distributional 
assumption modifications. 

The complete data likelihood for the RF model is 



G 



(15) Lc{4>)tt ]^L(& g ,d s ;c7£ iS )L(m g ;o-; 



Since no integration is required to evaluate this like- 
lihood, the Laplace approximation is not needed in 
this case. As with the LEMMA (RR) model, we first 
estimate the error variances, {cr^ g }, separately us- 
ing the marginal likelihood for {m g }. This results 
in the simple estimate, a 2 g = m g . The EM algo- 
rithm for estimating <fi then proceeds in an iden- 
tical manner except that a g is replaced by a g = 
o\ g {1/nig + l/ U2 g ) . The algorithm for the RH model 
is also similar with the marginal likelihood estima- 
tor of the homogeneous error variance given by a\ = 

Eg m gfg/E g fg- 

For all the fixed gene-specific effects models (FR, 
FF and FH) it is easily verified that d g — — 

ipg = 0. This implies that the EM update for the 
mixing parameter satisfies 



(m+1) 

Pi 



G 



(m) 
Pi 



(m) 



= Ly 

G ^ exp{-(d g - rH)V2^} +p 
>P ( ?\ 

where a 2 e g represents the appropriate a g estima- 
tor for the desired model. As a result, the EM se- 
quence for pi always converges to 1, regardless of 
the starting value. An explanation for this behavior 
is that the mixture probability is not identifiable if 
the gene-specific effects are fixed. 

3.3 A Generalization of LIMMA 

The LIMMA model proposed by Smyth (2004) 
is similar to the LEMMA model described in Sec- 
tion 2. A key difference is the assumption concerning 
the random gene-specific effects given in (4). The 



corresponding assumption in LIMMA is ip g \cr 2 g ~ 
N(0,voa 2 g ). This assumption, combined with (5), 
results in a closed form expression for the complete 
data likelihood (7), rendering the use of the Laplace 
approximation unnecessary. Another difference is that 
the mean effect of treatment, averaged across genes 
(r), is assumed to be zero in the LIMMA model. 
However, this difference has little bearing on the ar- 
guments that follow. 

As noted in Section 2, it is unnecessary to assume 
that the mean of the non-null gene-specific effects, 
if), is zero. Hence, we consider a generalized LIMMA 
model (denoted by RG in what follows) with 



(16) 



2 

£,g> 



for the non-null gene-specific effects, and, as such, it 
falls into the RR category. The EM algorithm dis- 
cussed earlier in this section can be implemented 
to fit this generalized model with minor modifica- 
tions. Specifically, after using the Laplace approxi- 
mation, the Q-function has the same form as (10) 
with VQ tg a1 g replacing cr^ + a g as the variance in 
the non-null density where vq i9 = v o + l/n\ g + 
\jri2g- This leads to update equations for p\ and r 
identical to (11) and (12), respectively. The update 
for tp is 



Y,UpTg ] id 9 -^ m+l) )/{voA) 



Am) 



EU (m) I, ~o \ 
s =lH, s 7Kfl< S ) 

and the update of vq satisfies 



) L _\^Jm) {ag- T " '~W ') 



\ ^(m) 1 

-0,9 g=l ' g fc , 



and = if p\ j9 = for all the genes. These updates 
simplify further if the sample sizes are the same for 
all genes. 

3.4 Model Extensions 

The LEMMA model is easily extended in a num- 
ber of useful ways. First, it enables within-group 
analysis which follows the same estimation proce- 
dure by simply dropping the i index and combin- 
ing the terms [i and r. We found this to be use- 
ful in practical applications, when, for example, re- 
searchers wish to perform a paired-sample test. 

Similarly, we can extend the model to have multi- 
ple treatment groups and test different (user-defined) 
contrasts, as was done in Smyth (2004) for the LIMMA 
model. Mathematically, this generalization is very 
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simple, and, in practice, when dealing with a small 
or moderate number of treatment groups, the es- 
timation procedure poses no significant computa- 
tional challenges. For example, we use the (t — 1- 
dimensional vector) summary statistics d g =HY 9 , 
where H is a contrast matrix (e.g., the Helmert ma- 
trix) and Y g = (Yi. g ,Y% g , . . . ,Yt. g )' . Note that the 
2x2 Helmert matrix gives the d g and s g statis- 
tics for the one-treatment case [scaled by a factor of 
l/\/2]. Obtaining the estimates and test statistics 
in the multiple treatment case is analogous to the 
derivations in (3.1). See the Appendix for details. 

As noted in Zhang, Zhang and Wells (2010), it is 
often the case that the probabilities of under- and 
over-expressed genes are not equal. The assumption 
that the distribution of the non-null genes has a 
nonzero mean [ip) can be modified to allow for multi- 
ple non-null components in the mixture distribution. 
For example, we might assume that each gene is ei- 
ther in the null group (ip g = 0) with probability po, 
in one non-null component with probability p\ with 
tp g ~ i.i.d. N(ip,a^), or in a second non-null group 
with probability p2 with ijj g ~ i.i.d. N(—ip, a 2 ^), where 
Po + Pi + P2 — 1 • The two-component model in the 
previous sections is the special case in which p 2 = 0. 
The lemma R package uses the three component mix- 
ture by default, and we have found that, indeed, 
when there are two mixture components, the EM 
algorithm converges to p2 = 0. Note that the R im- 
plementation assumes that the means of the non- 
null groups are of the same magnitude but opposite 
sign. This assumption can be relaxed, for instance, 
by assuming only that ipi < < ip2- 

4. INFERENCE 

The posterior probability that gene g is non-null 
is given by the expression (3). Its estimated value 
based on the LEMMA model can be expressed as a 
function of the likelihood ratio 

L o,g _ fo^ 
Ll >a h, 9 

= (2na 2 g r 1/2 ew{-(d 9 -f) 2 /2a 2 g } 

/([27r(4 + a s 2 )]-^ 
(17) • exp{-(d, - f - ^) 2 /2(a 2 + d 2 )}) 



-1/2 



•ex Pi -- 



1 [\ g (dg-T) + {l-\ 6 



oc 



a 9 



-1/2 



exp 



A 5 crf 



It 2 

2 9 



+ 



with the constant of proportionality being exp(i/j 2 / 



2a 2 ), where 



1 / 1 



1 



X 9 ~ 2 ( a 2 + J2 

a 9 \ a 9 a i> 



4+< 



The statistic T g is a posterior t-statistic, being the 
ratio of the estimated posterior expectation of ip g 
to its estimated posterior standard deviation. Note 
that the LEMMA model induces three forms of shrink- 
age in T g . The first two forms come from \ g > in 
both the numerator and the denominator. Third, 
(j 2 , a function of the posterior mode a 2 g , is itself a 
shrinkage estimator as a weighted compromise be- 
tween the usual error variance estimator m g and 
the mode of the inverse gamma distribution [(a + 

The likelihood ratio in (17) has the same form 
for the RF and RH models with a 2 g replaced by 



„ and a 2 , respectively in a 2 . [Recall that a 2 



a 2 g (l/nig + l/n2 S ).] Test statistics for the fixed mean 
effects models, FR, FF and FH, are obtained as lim- 
its of T g as X g — > 1 . 

It is interesting to compare the likelihood ratio 
(17) with the corresponding statistic under the 
LIMMA and RG model assumptions discussed in the 
previous section. For these models a 2 , is replaced by 



e,9 



and so the shrinkage coefficient becomes 



v + I /nig + l/n 2g 



In particular, if the sample sizes are the same for 
all genes, then the amount of shrinkage is the same 
for all genes. Furthermore, if ip is set equal to zero, 
as it is in LIMMA, then T g is proportional to the 
test-statistic for the FR model, 

„ d n —f 



°9 

This has the same form as the moderated t-statistic 
of Smyth (2004) and Wright and Simon (2003) ex- 
cept for the subtraction of the average gene effect, 
r, in the numerator and the use of the mode rather 
than the expected value of the posterior distribution 
of a 2 g given m g in the denominator. 
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For inference, we compare the posterior null prob- 
ability, 1 — pig in (3), with a local f.d.r. threshold 
to decide whether a gene is in the non-null group. 
Alternatively, our model-based approach also allows 
one to declare the non-null status of a gene by con- 
trolling the false discovery rate (FDR), using the 
Benjamini and Hochberg (1995) (BH) procedure for 
any given level, q* . Specifically, using the theoretical 
null-gene distributions of {d g }, which are assumed 
to be N(f,(Tg), we obtain the p- values for the ob- 
served {d g }. We denote the p- values by {P g }, and 
find the largest index g' for which P^ < q* x g' /G, 

where {Pg} is the sorted list of p- values. We declare 
all the genes with index smaller than or equal to g' 
(in the sorted list) as non-null, and the FDR the- 
orem guarantees that the expected false discovery 
rate is bounded by q* . 

5. SIMULATION STUDY 

In this section we assess the performance of several 
estimation/testing procedures mentioned in this pa- 
per under two data generation models, one accord- 
ing to the LEMMA model and the other accord- 
ing to the LIMMA model. In practice, the correct 
model is unknown, so our goal is to compare the 
power, accuracy, false discovery rate and parameter 
estimation for different true- model/procedure com- 
binations. In what follows we use the term "proce- 
dure" to define the combination of the model se- 
lected for analysis (which may or may not be the 
true model) and the estimation and inferential tech- 
niques derived from this model. 

5.1 Data Generation 

In both scenarios (LEMMA and LIMMA), we sim- 
ulated S = 100 data sets according to a mixture 
model with two groups, null and non-null. Each data 
set consisted of G = 2000 genes, of which p\G were 
non-null, and we used p\ =0.01,0.05,0.1,0.25. For 
each of the S data sets we drew G inverse gamma 
error variates with shape a and scale /?. By vary- 
ing a and f3, we adjusted the amount of error vari- 
ance variability present in the data. The values of 
a, /3, n\ g = ni, and ri2 g = ri2 were chosen so that 
mean(<7g) = 1. With n\ = = 6, we set a = 5 and 
/3 = 1/12 for the "low" error variance variability; we 
set a = 2.1 and (5 = 10/33 for the "high" error vari- 
ance variability. Hence, the standard deviation (and 
also the coefficient of variation, CV) of o 2 g for the 
former was and for the latter was \/l0- 



In the LEMMA-generated data, we varied tp, so 
that if) e {0, 1,2, ... ,6} = and set = 1. In the 
LIMMA data generation setup, we used vq G {g, §, 
. . . , | } to generate the non-null genes according to 
(16). For both generation schemes we set r = 0, as 
the LIMMA model does not involve r, and it is only 
estimable under the random gene by treatment in- 
teraction effect models (RR, RF, RH). We gener- 
ated yijg according to equations (1) and (2) with 
the above parameter specifications, and computed 
{dg} and {m g }. While we only present results for 
a selection of specific parameter value settings, nu- 
merous simulations were performed with a variety 
of sample sizes rii, i = 1, 2, non-null probabilities pi, 
and gene-specific treatment variances cr^. In addi- 
tion, we also considered using the log-normal dis- 
tribution to generate the error variance &1 rather 
than the inverse gamma distribution. We found the 
results to be qualitatively insensitive to these dif- 
ferent settings, and the results presented below por- 
tray an accurate summary of the performance of the 
methods. 

5.2 Data Analysis and Results 

We consider two metrics for determining null and 
non-null status of genes. The first method is based 
on computing empirical quantile critical values. Since 
the distribution of many of our test statistics is un- 
known, we defined a test-specific critical value, T c , 
as the 0.95 quantile among the 1900 x 100 null genes. 
By design, this resulted in an average size of 0.05 for 
each test. The average power for each procedure was 
determined by the proportion of non-null genes cor- 
rectly declared non-null based on the (test-specific) 
empirical critical value T c . Figure 1 shows the av- 
erage power (on the logit scale) of the likelihood 
ratio tests derived assuming the FF, FH, FR, RF, 
RH and RR models, with estimation procedures as 
described in Section 3. Also included in our com- 
parison were the RG likelihood ratio tests, derived 
from the model defined in Section 3.3, and the mod- 
erated t-tests obtained from the limma R package. 
Since in our simulations we know the exact values 
of the parameters, we also included the "Optimal 
Rule" statistics (denoted by OR) which were ob- 
tained by plugging in the true parameter values in 
the likelihood ratio statistic for the true data gener- 
ation model (either LEMMA or LIMMA). 

When the data are generated according to the 
LEMMA model our simulations show that the tests 
derived from the RR model achieved the highest 
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Fig. 1. Average power (on the logit scale) for empirical quantile analysis under the RR data generation model, with 
m — n2 = 6, S = 100 samples, G = 2000 genes, and pi = 0.05 probability of non-null status. Left; low error variance variability 
(CV = 0.58). Right: high error variance variability (CV = 3.16). 



power for all ip 6 (and almost identical to the 
Optimal Rule's), as can be seen in Figure 1. When 
the data are generated according to the LIMMA 
model, the likelihood ratio tests derived from the RR 
and RG models have nearly identical performance in 
terms of power as those of the moderated-t statistics 
and the LIMMA Optimal Rule for all values of vq 
(figure not shown). 

As expected, our simulations also showed that the 
average power in the homogeneous error variance 
models (RH, FH) decreases as the error variance 
variability increases. In general, the random gene 
models (RR, RF, RH) demonstrate higher average 
power than their corresponding fixed gene counter- 
parts. Notice also that the performance of moderated- 
t and the FR statistics are almost identical. 

The second performance assessment method did 
not require computing empirical quantiles, and was 
based on local f.d.r. criteria. Efron et al. (2001) and 
Efron (2005) defined local f.d.r. as 

(18) f.d.r.(y 9 ) = Pr(null|y = y 9 ) 

for the posterior probability of a gene g being in the 
null group. Note that this is precisely 1 — pi >g (y g ), 



where Pi, g (y g ) is given by (3). Since p\ can only 
be estimated in the random-mean models, we only 
considered the local f.d.r. statistics associated with 
RR, RF and RH. For comparison, we also considered 
the local f.d.r. statistics for RG and the Optimal 
Rule, and two types of B statistics computed by the 
limma package to differentiate between those com- 
puted with the default value of p\ = 0.01 [referred to 
as "Limma(O.Ol)"] and those computed with the es- 
timated value of pi [referred to as "Limma(pi)"]. We 
also included local f.d.r. statistics computed from 
the locf dr (Efron, Turnbull and Narasimhan, 2008) 
R package (referred to as "Efron" for simplicity). 

To evaluate the performance of these procedures, 
we looked at two complementary metrics. The first is 
the measure of accuracy, defined by the ratio ( TP + 
TN)/(P + N) as in Hong (2009), where P and N 
are the total numbers of non-null and null genes, 
respectively, and TP + TN is the sum of correct 
classifications (true positives plus true negatives). 
The second metric is the false discovery rate, defined 
by FP/(FP + TP), where FP is the total number 
of false positives. Clearly, our goal is to maximize 
the accuracy while maintaining a low false discovery 
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Fig. 2. Accuracy (left) and false discovery rate (right) for data generated under the LEMMA model with m = na = 6, S = 100 
samples, G = 2000 genes, and ip = 3, pi = 0.05 probability of non-null status, and high error variance variability ( CV = 3.16 ). 



rate. To compare different methods, we computed 
the accuracy and FDR for a range of posterior null 
probability thresholds (between and 0.5). A gene is 
declared as non-null if its posterior null probability 
is below the selected threshold. Note that when the 
threshold is 0, all genes are declared as null and we 
obtain accuracy of 1 — p\. As we increase the thresh- 
old, the total number of detections increases, and if 
we let the threshold be 1, all genes are declared as 
non-null (and the accuracy is p\). 

Figures 2 and 4 demonstrate that when the data 
are generated under the LEMMA model, the RR 
procedure achieves the highest level of accuracy for 
any posterior probability threshold in the range [0, 
0.5], and is practically the same as the Optimal Rule. 
It has only a slightly higher FDR, compared with 
the Optimal Rule. Note that RF has high accuracy, 
but very high FDR, indicating it is too liberal and 
declares too many genes as non-null. 

We also observe that the RR and RG procedures 
are quite similar, which is an indication that the 
choice of the non-null variance model (either cr^ as 
in LEMMA, or v§o 2 e g as in LIMMA) does not have 
a significant impact on the performance. We also 
notice that when the limma package is used with 



the estimated value of p±, instead of the default, the 
accuracy is greatly improved, with a relatively small 
increase in FDR. Still, the RR procedure (under the 
LEMMA data generation scheme) is clearly superior 
to all other methods. 

Interestingly, when the data are generated under 
the LIMMA model, we get similar results — the RR 
procedure achieves higher accuracy, and only a rel- 
atively small increase in false discoveries (see Fig- 
ures 3 and 5). It is also interesting that the limma 
procedure does not achieve the performance of its 
Optimal Rule, and we believe this is due to inaccu- 
rate estimation of p\ , as demonstrated below. Note 
that lemma uses maximum likelihood estimation for 
all the model parameters, while limma uses ad-hoc 
methods to estimate p\ and vq. In summary, lemma 
and its RG variant are competitive with limma when 
LIMMA is the true data generating model, but they 
are clearly superior when LEMMA is the true data 
generating model. Furthermore, the additional pa- 
rameters (T,ip) in the LEMMA model do not add 
to the computational complexity, as the maximum 
likelihood estimators are obtained via a simple, and 
fast EM algorithm. 

To conclude this subsection, we remark that al- 
though it is possible to compute posterior probabil- 
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Fig. 3. Accuracy (left) and false discovery rate (right) for data generated under the LEMMA model with ni = ni = 6, S= 100 
samples, G = 2000 genes, and ip = 3, pi = 0.25 probability of non-null status, and high error variance variability ( CV = 3.16 ). 
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Fig. 4. Accuracy (left) and false discovery rate (right) for data generated under the LIMMA model with n\ — n2 — 6, S = 100 
samples, G = 2000 genes, and vo = 1, pi = 0.05 probability of non-null status, and high error variance variability ( CV = 3.16 ). 
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Fig. 5. Accuracy (left) and false discovery rate (right) for data generated under the LIMMA model with n\ — ri2 — 6, S — 100 
samples, G = 2000 genes, and vq = 1, pi = 0.25 probability of non-null status, and high error variance variability ( CV = 3.16 ). 



ities using the limma package (which involves plug- 
ging in the estimates for vq and p±), in practice, 
inference via the limma package is often frequentist 
in nature (using the p-values, computed from the 
t-statistics, returned by the eBayes function). 

5.3 Estimation Performance 

We also analyzed the parameter estimation per- 
formance of the lemma software, and we found it to 
be very accurate when the data are generated under 
the LEMMA model. However, since this is not un- 
expected, we chose to present a more interesting re- 
sult. Recall that both LEMMA and LIMMA require 
estimation of the non-null prior probability, p\. We 
compared the estimation of this important param- 
eter under those two data generation models using 
four estimation methods, including lemma, convest 
(from the limma package) and two estimation proce- 
dures available in the locfdr package — denoted by 
EF-MLE and EF-CME. Smyth (2004) argues that 
the mixture proportion parameter is difficult to es- 
timate in the model leading to the B-statistic, and 
our simulations verify that the estimates of p\ pro- 
duced by the limma package are significantly biased. 
(As noted earlier, the limma package uses value of 



pi = 0.01, rather than an estimate.) Figure 6 shows 
that when p\ = 0.05 lemma tends to slightly over- 
estimate the parameter, while the other methods 
tend to underestimate it. This is in agreement with 
the observation that lemma achieves higher accu- 
racy, and has a slightly higher FDR. We also point 
out that both estimation methods available in the 
locfdr package not only underestimate p\, but also 
give unreasonable (negative) estimates. The lemma 
estimation procedure is significantly better than the 
other three for higher values of pi, even when the 
data are generated under the LIMMA model. 

6. EXAMPLES 

Using the lemma software, we fitted the LEMMA 
model to several microarray data sets. For illustra- 
tion purposes, we provide our analysis of two pub- 
licly available, two-channel gene expression microar- 
ray data sets that were previously analyzed: the 
ApoAl data (Callow et al., 2000) and the Colon 
Cancer data (Alon et al., 1999). 

6.1 ApoAl Data 

The ApoAl experiment (Callow et al., 2000) used 
gene targeting in embryonic stem cells to produce 
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mice lacking apolipoprotein A-l, a gene known to 
play a critical role in high density lipoprotein (HDL) 
cholesterol levels. Originally, 5600 expressed sequence 
tags (EST) were selected. In our analysis, we used 
the data and normalization method provided with 
the limma R package (Smyth, 2005), which consists 
of 5548 ESTs, from 8 control (wild type "black six") 
mice and 8 "knockout" (lacking ApoAl) mice. Com- 
mon reference RNA was obtained by pooling RNA 
from the control mice, and was used to perform ex- 
pression profiling for all 16 mice. Note that the cur- 
rent version of the limma user's guide refers to a 
larger data set which contains 6384 ESTs. Qualita- 
tively speaking, using the larger data set does not 
yield different results (in terms of detecting signifi- 
cant genes). 



The response of interest, Uij g , is the log 2 fluores- 
cence ratio (with respect to the common reference) 
where g is one of 5548 genes, j = 1, ...,8 (mouse 
number), and % is the population index (control and 
knockout). Using the EM algorithm, we obtained es- 
timates for the parameters in our LEMMA model. 
Figure 7(a) depicts the histogram of the 5548 d g 
statistics. The smooth black curve shows the fit- 
ted mixture distribution, drawn using the average 
estimated error variance. The smooth blue and red 
curves correspond to the average fitted distributions 
of the null and non-null groups, respectively. Per- 
gene fitted distributions are plotted in light colors 
(note that the non-null probability is very small, so 
only gene-specific distributions of the null group, in 
light blue, can be observed in this case). The mean- 
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Fig. 7. (a) Histogram of the 5548 d g statistics from the ApoAI data set and the fitted distributions, (b) The Ben- 
jamini-Hochberg adjusted p-values for all genes. Using an FDR level of 0.2, we detect 25 non-null genes, (c) Histogram 
of the m g statistics and the fitted distribution, (d) The RR test statistics of all the genes. Using a 0.2 threshold for the 
posterior probability, we declare 9 genes to be non-null. 



effect parameter estimates we obtained are f = 0.007 
and ip = 0.682, <rj = 0.874. 

Figure 7(c) depicts the histogram of the m g statis- 
tics and the fitted distribution. The estimates for the 
shape and scale parameters of the error variance dis- 
tribution are 1.87 and 11.11, respectively. The em- 
pirical mean and variance of {crf i9 } are 0.078 and 
0.004. 

Using the lemma package, we obtained the param- 
eter estimates, and computed the gene-specific pos- 
terior probabilities and the p-values for the hypothe- 
ses that genes are in the null group. Figure 7(b) 
depicts the Benjamini-Hochberg adjusted p-values. 
The red, solid points represent the genes that were 
declared non-null, using a (liberal) FDR threshold of 
0.2. Using the FDR criteria, we detected 25 non-null 
genes. 

Using the posterior probabilities derived from the 
LEMMA (RR) model and Efron's 0.2 threshold for 
local f.d.r., we detected 9 non-null genes, including 
the ApoAI gene and others that are closely related 



to it. The top eight genes had local f.d.r. values 
of nearly zero, while the ninth had a much higher 
value of 0.08. Figure 7(d) depicts the RR local f.d.r. 
statistics, and the red, solid points represent the 
genes that were declared non-null using a local f.d.r. 
threshold of 0.2. The top eight genes (using either 
the FDR or the local f.d.r. criteria) are also iden- 
tified (among others) when using the limma and 
locf dr R packages, and were confirmed to be differ- 
entially expressed in the knockout versus the control 
line by an independent assay. 

Interestingly, assuming no other genes are in the 
non-null group, the true value of p\ is 0.00144, and 
the estimate obtained from lemma is 0.0039, while 
Efron's estimates using the MLE and CME meth- 
ods are —0.036 and —0.083, respectively. As we men- 
tioned earlier, by default the limma R package does 
not provide an estimate for pi, and uses a value of 
0.01. However, using the convest function, limma 
provides the estimate p\ = 0.30. When one uses the 
larger ApoAI data set currently referred to by the 
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limma user's guide (with 6384 ESTs), the estimate 
for pi is 0.134. 

6.2 Colon Cancer Data 

The data analyzed by Alon et al. (1999) consists 
of 2000 ESTs in 40 tumor and 22 normal colon 
tissue samples. Of the 40 patients involved in the 
study, 22 supplied both tumor and normal tissue 
samples. In their analysis, Alon et al. (1999) used 
an Affymetrix oligonucleotide array complementary 
to more than 6500 human genes and expressed se- 
quence tags (ESTs), and a two-way clustering method 
to identify families of genes and tissues based on ex- 
pression patterns in the data set. Do, Miiller and 
Tang (2005) used a Bayesian mixture model to ana- 
lyze the same data set and estimated the probabil- 
ity of differential expression. Using empirical Bayes 
methods, they obtained a point estimate po = 0.39 
and contrasted it with the posterior marginal prob- 
ability distribution of po from the nonparametric 
Bayesian model, which they fit using MCMC sim- 
ulations. The empirical Bayes estimate for po was 
far out in the right tail of the posterior distribution, 
which, they argued, might lead to underestimating 
the posterior probability of being in the non-null 
group (differentially expressed genes). They propose 
using posterior expected FDR (Genovese and Wasser- 
man, 2002) thresholds to calibrate between a desired 
false discovery rate and the number of significant 
genes. For example, with FDR = 0.2, they find 1938 
non-null genes. 

Using lemma and assuming the two-group LEMMA 
model, we obtain p\ = 0.36. According to this model, 
the (non-null) mean effect of the gene-specific term 
is estimated by ijj = —0.04 (and the variance by = 
0.24), and the fitted two-group mixture distribu- 
tion is shown in Figure 8(a). The near-zero mean 
of the non-null mixture component suggests that 
there may be two non-null groups (over- and under- 
expressed groups of genes). We fitted the three-group 
variant of the LEMMA model to the data, and ob- 
tained pi = 0.22, p 2 = 0.12, and $ = -0.33, oj = 0.15 
[see Figure 8(b)]. In Figure 8(a) and (b) the light 
blue and purple curves represent the (per gene) fit- 
ted distributions for the null and non-null groups, 
respectively. The smooth black curve shows the fit- 
ted mixture distribution, drawn using the average 
estimated error variance. 

The three-group model allows for asymmetry in 
the proportions of over- and under-expressed genes. 
We see no reason to assume that these proportions 



should be equal. However, we find in simulations 
that if they are indeed equal, our procedure esti- 
mates them accurately. We have observed that if the 
true model has two non-null groups, then estimating 
it assuming two modes results in an estimate of ip 
that is biased toward and an inflated a 2 ^ (as seen 
in this case), and that this could lead to fewer true 
discoveries. 

In this data set, the empirical mean and variance 
of m g are 1.00 and 0.17, respectively, with estimates 
a = 10.42 and /3 = 0.11. Figure 8(c) shows the his- 
togram of the m g statistics and the fitted distribu- 
tion. 

The "volcano plot" in Figure 8(d) depicts the pos- 
terior null probability of genes based on the three- 
group LEMMA model versus the d g statistics. Us- 
ing the null posterior probability threshold of 0.2, 
we detect 170 non-null genes, while using the FDR 
method (with a threshold of 0.2) we get 155 genes. 
Detecting non-null genes in a typical microarray gene 
expression analysis involves setting a minimum fold- 
change threshold, in addition to setting the level at 
which the False Discovery Rate is controlled. For 
instance, requiring that \d g \ > 1 and controlling the 
False Discovery Rate at 0.1, we detect 61 non-null 
genes, all of which were detected by at least one 
method in Su et al. (2003). 

7. DISCUSSION 

In the previous sections we demonstrated that our 
modeling framework can lead to six different test 
statistics depending on the assumptions imposed on 
the gene-specific effects. Interestingly, the test statis- 
tics associated with these models have been consid- 
ered independently in the literature in various forms, 
but to our knowledge, this is the first time they 
have been categorized as special cases of the same 
model. The LEMMA (RR) model, in which both 
the non-null gene-specific effects and gene-specific 
variances are modeled as random variates, leads to 
James-Stein-type (shrinkage) estimation of the pa- 
rameters. Specifically, the statistics derived from the 
RR model enjoy shrinkage in both the numerator 
and denominator of a posterior i-statistic, result- 
ing in powerful test statistics while maintaining few 
false positives in our simulation studies. Using a 
Laplace approximation to make the EM algorithm 
tractable, our approach yields stable parameter es- 
timates, even for the notoriously difficult parame- 
ter pi . 
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Fitting the Colon Cancer Data (Alon, 1999) 
(a) one nonnull group (b) two nonnull groups 
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(c) m g distribution 



(d) volcano plot 
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Fig. 8. Histograms of the 2000 d g statistics from the Alon et al. (1999) data set and the fitted distributions, assuming (a) 
a two-group model, or (b) three-group model, (c) Histogram of the m g statistics and the fitted distribution, (d) Volcano plot, 
showing the posterior null probabilities by d g . 

Since our approach is model-based, it can be eas- 
ily generalized to other situations. For example, as 
stated earlier, the methods described in this pa- 
per can be extended to deal with multiple treat- 
ments, paired tests (one group) and multiple non- 
null components. Furthermore, it is straightforward 
to add fixed-effect covariates to the model. We are 



currently working on the next release of the lemma 
package which will include this feature, in addition 
to within-group analysis, new plotting and export- 
ing functions, and confidence intervals for parameter 
estimates. Extending the model to handle multivari- 
ate responses is also being investigated. 
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APPENDIX 

In this section we provide details on some of our 
previous derivations, and elaborate on the case of 
multiple treatments. 

A.l Empirical Bayes Estimates for a and (3 

To obtain an estimate of the error variance in the 
random error case, recall that 



I fg ^cr e g 
Gamma ( — , — — — 

2 Jg 



(19) 

Jg 

We maximize the marginal density of m g numer- 
ically to obtain maximum likelihood estimates of a 
and (5. Given the conditional distribution in (19), 
we find the marginal density of m g by integrating 



out cr 2 „. Specifically, 
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The final equality in (20) results from noting that 
the integral in the third equality is proportional to 
a Gamma(/ s /2 + a, [/3 _1 + m g f g /2]~ 1 ) density We 
maximize ^2 g ^og(f(m g )) with respect to a and j3 to 

obtain the empirical Bayes estimates a and /3. 
The joint distribution of m g and a £) g is given by 
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So, conditional on m g , 

a~l ~ Gamma(a + / g /2, (m ff / fl /2 + 
Hence, the conditional expectation is 
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Note that using the approximation of the mode, 
fh ~ [(a + 1)0] ~, in both the posterior mean and 
posterior mode yields a shrinkage-estimator form. 
Equivalently, we could replace (a + 1) with (a — 1) 
in the conditional expectation and the conditional 
mode, and obtain shrinkage toward the sample mean 
of {rrig}. 

A. 2 Maximum Likelihood Estimation of <fi 

Recall that in the RR method we use the Laplace 
approximation (8), hence, the (approximate) com- 
plete likelihood is 
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with log-likelihood 
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The estimates (11)-(14) are obtained by maximiz- 
ing the log-likelihood with respect to the parame- 
ters, 4>. 

Although the Laplace approximation is not neces- 
sary in the RF and RH models, note that the com- 
plete likelihoods and log-likelihoods for the these 
models are identical to equations (21) and (22), with 
(jg replaced by a 2 and a 2 e (as defined in Section 3.2), 
respectively. 

A. 3 Multiple Treatments 

In the general case we assume t > 2 treatments i = 
1, 2, . . . , t assigned to t groups of ni )S , n2, 9 , . . . , n t) g 
subjects indexed by ji = 1, . . . , m >g , . . . ,j t = 1, • • • , 
n t: g, and we use the model defined by (1) and (2). 
Here, we impose a standard (fixed effect) constraint 



t 



0. 



The distributions for the gene-specific effects in the 
multiple-treatment case are assumed to follow a nor- 
mal distribution, 

V s ~i.i.d. iW,4(T-J<)), 



To estimate the rest of the parameters in the LEMMA 
model, we use the (t — 1-dimensional vector) test 
statistics 
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Derivations similar to the ones we used to obtain 
the estimates in Section 3 lead to the same estimate 
for pi and to the following estimates, analogous to 
(12) and (13): 
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The update for a\ is the solution to the equation, 
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t-dimensional vector, and a 2 ,, is a scalar. The test 



statistic nig is defined as 
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where / 9 = m g + • • • + nt g — t, and we use m g as 
before, to estimate a and /3. 
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The likelihood ratio test statistic has a similar 
form as (17), 

^ = |i-A 9 r 1 / 2exp |_i r '(A - 1 A; 1 )r} 

•exp{-^ 2 (H^)'(H</,)}, 

where 

r = [A s (d g -Hr) + (I-A g )(HV>)], 
K g = (A A + A ) _1 A A , 
I-A g = (A j4 + Ao)~ 1 A . 
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